Page 1 of 4

2019 Journal article Open Access

A survey of methods for explaining black box models
Guidotti R., Monreale A., Ruggieri S., Turini F., Giannotti F., Pedreschi D.
In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.Source: ACM computing surveys 51 (2019). doi:10.1145/3236009
DOI: 10.1145/3236009
DOI: 10.48550/arxiv.1802.01933
Project(s): SoBigData via OpenAIRE

Metrics:

2018 Report Open Access

Local rule-based explanations of black box decision systems
Guidotti R., Monreale A., Ruggieri S., Pedreschi D., Turini F., Giannotti F.
The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts.% Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, ie, explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box.Source: ISTI Technical reports, 2018
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository | CNR ExploRA

2018 Report Open Access

Open the black box data-driven explanation of black box decision systems
Pedreschi D., Giannotti F., Guidotti R., Monreale A., Pappalardo L., Ruggieri S., Turini F.
Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We introduce the local-to-global framework for black box explanation, a novel approach with promising early results, which paves the road for a wide spectrum of future developments along three dimensions:(i) the language for expressing explanations in terms of highly expressive logic-based rules, with a statistical and causal interpretation;(ii) the inference of local explanations aimed at revealing the logic of the decision adopted for a specific instance by querying and auditing the black box in the vicinity of the target instance;(iii), the bottom-up generalization of the many local explanations into simple global ones, with algorithms that optimize the quality and comprehensibility of explanations.Source: ISTI Technical reports, 2018
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository | CNR ExploRA

2015 Contribution to book Restricted

Clustering formulation using constraint optimization
Grossi V., Monreale A., Nanni M., Pedreschi D., Turini F.
The problem of clustering a set of data is a textbook machine learning problem, but at the same time, at heart, a typical optimization problem. Given an objective function, such as minimizing the intra-cluster distances or maximizing the inter-cluster distances, the task is to find an assignment of data points to clusters that achieves this objective. In this paper, we present a constraint programming model for a centroid based clustering and one for a density based clustering. In particular, as a key contribution, we show how the expressivity introduced by the formulation of the problem by constraint programming makes the standard problem easy to be extended with other constraints that permit to generate interesting variants of the problem. We show this important aspect in two different ways: first, we show how the formulation of the density-based clustering by constraint programming makes it very similar to the label propagation problem and then, we propose a variant of the standard label propagation approach.Source: Software Engineering and Formal Methods, edited by Domenico Bianculli, Radu Calinescu, Bernhard Rumpe, pp. 93–107, 2015
DOI: 10.1007/978-3-662-49224-6_9
Project(s): ICON via OpenAIRE

Metrics:

See at: doi.org Restricted | link.springer.com | CNR ExploRA

2013 Contribution to book Restricted

What else can be extracted from ontologies? Influence rules
Furletti B., Turini F.
A method for extracting new implicit knowledge from ontologies by using an inductive/deductive approach is presented. The new extracted knowledge takes the form of If-Then rules annotated with a weight. Such rules, termed Influence Rules, specify how the values of the properties bound to a collection of concepts may influence the values of the properties of another concept.The technique, that combines data mining and link analysis, is completely general and applicable to whatever domain. The paper reports the methods and the algorithms supporting the process of mining the rules out of the ontology, and discusses its application to real data from the economic field.Source: Software and Data Technologies. Revised selected papers, edited by María José Escalona, José Cordeiro, Boris Shishkov, pp. 270–285. Heidelberg: Springer, 2013
DOI: 10.1007/978-3-642-36177-7_17
Metrics:

See at: doi.org Restricted | link.springer.com | CNR ExploRA

2013 Journal article Open Access

Discrimination discovery in scientific project evaluation: a case study
Romei A., Ruggieri S., Turini F.
Discovering contexts of unfair decisions in a dataset of historical decision records is a non-trivial problem. It requires the design of ad hoc methods and techniques of analysis, which have to comply with existing laws and with legal argumentations. While some data mining techniques have been adapted to the purpose, the state-of-the-art of research still needs both methodological refinements, the consolidation of a Knowledge Discovery in Databases (KDD) process, and, most of all, experimentation with real data. This paper contributes by presenting a case study on gender discrimination in a dataset of scientific research proposals, and by distilling from the case study a general discrimination discovery process. Gender bias in scientific research is a challenging problem, that has been tackled in the social sciences literature by means of statistical regression. However, this approach is limited to test an hypothesis of discrimination over the whole dataset under analysis. Our methodology couples data mining, for unveiling previously unknown contexts of possible discrimination, with statistical regression, for testing the significance of such contexts, thus obtaining the best of the two worlds. (C) 2013 Elsevier Ltd. All rights reserved.Source: Expert systems with applications 40 (2013): 6064–6079. doi:10.1016/j.eswa.2013.05.016
DOI: 10.1016/j.eswa.2013.05.016
Metrics:

See at: Expert Systems with Applications Open Access | Expert Systems with Applications Restricted | www.sciencedirect.com | CNR ExploRA

2013 Contribution to book Restricted

The discovery of discrimination
Pedreschi D., Ruggieri S., Turini F.
Discrimination discovery from data consists in the extraction of discriminatory situations and practices hidden in a large amount of historical decision records.We discuss the challenging problems in discrimination discovery, and present, in a unified form, a framework based on classification rules extraction and filtering on the basis of legally-grounded interestingness measures. The framework is implemented in the publicly available DCUBE tool. As a running example, we use a public dataset on credit scoring.Source: Discrimination and Privacy in the Information Society. Data Mining and Profiling in Large Databases., edited by Bart Custers, Toon Calders, Bart Schermer, Tal Zarsky, pp. 91–108. Berlin Heidelberg: Springer, 2013
DOI: 10.1007/978-3-642-30487-3_5
Metrics:

See at: doi.org Restricted | link.springer.com | CNR ExploRA

2012 Journal article Restricted

Stream mining: a novel architecture for ensemble-based classification.
Grossi V., Turini F.
Mining data streams has become an important and challenging task for a wide range of applications. In these scenarios, data tend to arrive in multiple, rapid and time-varying streams, thus constraining data mining algorithms to look at data only once. Maintaining an accurate model, e.g. a classifier, while the stream goes by requires a smart way of keeping track of the data already passed away. Such a synthetic structure has to serve two purposes: distilling the most of information out of past data and allowing a fast reaction to concept drifting, i.e. to the change of the data trend that necessarily affects the model. The paper outlines novel data structures and algorithms to tackle the above problem, when the model mined out of the data is a classifier. The introduced model and the overall ensemble architecture are presented in details, even considering how the approach can be extended for treating numerical attributes. A large part of the paper discusses the experiments and the comparisons with several existing systems. The comparisons show that the performance of our system in general, and in particular with respect to the reaction to concept drifting, is at the top levelSource: Knowledge and Information Systems 30 (2012): 247–281. doi:10.1007/s10115-011-0378-4
DOI: 10.1007/s10115-011-0378-4
Metrics:

See at: Knowledge and Information Systems Restricted | link.springer.com | CNR ExploRA

2012 Journal article Restricted

Mining Bayesian networks out of ontologies
Bellandi A., Turini F.
Probabilistic reasoning is an essential feature when dealing with many application domains. Starting with the idea that ontologies are the right way to formalize domain knowledge and that Bayesian networks are the right tool for probabilistic reasoning, we propose an approach for extracting a Bayesian network from a populated ontology and for reasoning over it. The paper presents the theory behind the approach, its design and examples of its useSource: Journal of intelligent information systems 38 (2012): 507–532. doi:10.1007/s10844-011-0165-4
DOI: 10.1007/s10844-011-0165-4
Metrics:

See at: Journal of Intelligent Information Systems Restricted | link.springer.com | CNR ExploRA

2012 Journal article Restricted

Knowledge discovery in ontologies
Furletti B., Turini F.
Ontologies allow us to represent knowledge and data in implicit and explicit ways. Implicit knowledge can be derived by means of several deductive logic-based processes. This paper introduces a new way for extracting implicit knowledge from ontologies by means of a sort of link analysis of the T-box of the ontology integrated with a data mining step on the A-box. The implicit extracted knowledge has the form of In uence Rules" i.e. rules structured as: if the property p1 of concept c1 has value v1, then the property p2 of concept c2 has value v2 with probability . The technique is completely general and applicable to whatever domain. The In uence Rules can be used to integrate existing knowledge or for supporting any other data mining process. A case study about an ontology describing intrusion detection is used to illustrate the result of the method.Source: Intelligent data analysis 16 (2012): 513–534. doi:10.3233/IDA-2012-0536
DOI: 10.3233/ida-2012-0536
Metrics:

See at: Intelligent Data Analysis Restricted | CNR ExploRA

2011 Conference article Unknown

Mining influence rules out of ontologies
Furletti B., Turini F.
A method for extracting new implicit knowledge starting from the ontology schema by using an inductive/ deductive approach is presented. By giving a new interpretation to relationships that already exist in an ontology, we are able to return the extracted knowledge as weighted If-Then Rules among concepts. The technique, that combines data mining and link analysis, is completely general and applicable to whatever domain. Since the output is a set of "standard" If-Then Rules, it can be used to integrate existing knowledge or for supporting any other data mining process. An application of the method to an ontology representing companies and their activities is included.Source: ICSOFT 2011 - International Conference on Software and Data Technologies, pp. 323–333, Seville, Spain, 18-21 July 2011

See at: CNR ExploRA

2011 Conference article Open Access

k-NN as an implementation of situation testing for discrimination discovery and prevention
Luong Binh Thanh, Ruggieri Salvatore, Turini Franco
With the support of the legally-grounded methodology of situation testing, we tackle the problems of discrimination discovery and prevention from a dataset of historical decisions by adopting a variant of k-NN classifi cation. A tuple is labeled as discriminated if we can observe a signi ficant di erence of treatment among its neighbors belonging to a protected-by-law group and its neighbors not belonging to it. Discrimination discovery boils down to extracting a classi fication model from the labeled tuples. Discrimination prevention is tackled by changing the decision value for tuples labeled as discriminated before training a classi fier. The approach of this paper overcomes legal weaknesses and technical limitations of existing proposals.Source: 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '11, pp. 502–510, San Diego, California, USA, August 21-24 2011
DOI: 10.1145/2020408.2020488
Metrics:

See at: www.di.unipi.it Open Access | doi.org Restricted | CNR ExploRA

2010 Journal article Open Access

Integrating induction and deduction for finding evidence of discrimination
Pedreschi D., Turini F., Ruggieri S.
We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize the process of direct and indirect discrimination discovery in a rule-based framework, by modelling protected-by-law groups, such as minorities or disadvantaged segments, and contexts where discrimination occurs. Classification rules, extracted from the historical records, allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is evaluated by formalizing existing norms and regulations in terms of quantitative measures. The measures are defined as functions of the contingency table of a classification rule, and their statistical significance is assessed, relying on a large body of statistical inference methods for proportions. Key legal concepts and reasonings are then used to drive the analysis on the set of classification rules, with the aim of discovering patterns of discrimination, either direct or indirect. Analyses of affirmative action, favoritism and argumentation against discrimination allegations are also modelled in the proposed framework. Finally, we present an implementation, called LP2DD, of the overall reference model that integrates induction, through data mining classification rule extraction, and deduction, through a computational logic implementation of the analytical tools. The LP2DD system is put at work on the analysis of a dataset of credit decision records.Source: Artificial intelligence and law (Dordr., Online) 18 (2010): 1–43. doi:10.1007/s10506-010-9089-5
DOI: 10.1007/s10506-010-9089-5
DOI: 10.1145/1568234.1568252
Metrics:

See at: Artificial Intelligence and Law Open Access | Artificial Intelligence and Law Restricted | doi.org | link.springer.com | CNR ExploRA

2010 Conference article Closed Access

DCUBE: Discrimination Discovery in Databases
Pedreschi D., Turini F., Ruggieri S.
Discrimination discovery in databases consists in finding unfair practices against minorities which are hidden in a dataset of historical decisions. The DCUBE system implements the approach of [5], which is based on classification rule extraction and analysis, by centering the analysis phase around an Oracle database. The proposed demonstration guides the audience through the legal issues about discrimination hidden in data, and through several legally-grounded analyses to unveil discriminatory situations. The SIGMOD attendees will freely pose complex discrimination analysis queries over the database of extracted classification rules, once they are presented with the database relational schema, a few ad-hoc functions and procedures, and several snippets of SQL queries for discrimination discovery.Source: ACM International Conference on Management of Data (SIGMOD 2010), pp. 1127–1130, Indianapolis, IN, 6-11 June 2010
DOI: 10.1145/1807167.1807298
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2010 Journal article Open Access

Data mining for discrimination discovery
Pedreschi D., Ruggieri S., Turini F.
In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. In this article, we introduce the problem of discovering discrimination through data mining in a dataset of historical decision records, taken by humans or by automatic systems. We formalize the processes of direct and indirect discrimination discovery by modelling protected-by-law groups and contexts where discrimination occurs in a classification rule based syntax. Basically, classification rules extracted from the dataset allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is formalized by an extension of the lift measure of a classification rule. In direct discrimination, the extracted rules can be directly mined in search of discriminatory contexts. In indirect discrimination, the mining process needs some background knowledge as a further input, for example, census data, that combined with the extracted rules might allow for unveiling contexts of discriminatory decisions. A strategy adopted for combining extracted classification rules with background knowledge is called an inference model. In this article, we propose two inference models and provide automatic procedures for their implementation. An empirical assessment of our results is provided on the German credit dataset and on the PKDD Discovery Challenge 1999 financial dataset.Source: ACM transactions on knowledge discovery from data 4 (2010): 1–40. doi:10.1145/1754428.1754432
DOI: 10.1145/1754428.1754432
Metrics:

See at: ACM Transactions on Knowledge Discovery from Data Open Access | ACM Transactions on Knowledge Discovery from Data Restricted | CNR ExploRA

2010 Journal article Restricted

Improving the business plan evaluation process: the role of intangibles
Turini F., Baglioni M., Bellandi A., Furletti B., Pratesi C.
One of the main objectives of the European MUSING project is to design and test software tools in order to support the activities of small and medium sized businesses. In this paper we examine financial risk management and, more specifically, the self-assessment of business plans. The role of intangible assets is discussed, and we report on how intangible assets can be collected, how they can be represented, taking into account their semantic relationships, and how they can be used to build an analytical tool for business plans. The basic technology embedded in the tool is the construction of classification trees, a well-known technique in inductive learning. We show how using knowledge of intangible assets can improve the construction of the classifier, as proved by the testing carried out so far.Source: Quality technology & quantitative management (Print) 7 (2010): 35–50. doi:10.1080/16843703.2010.11673217
DOI: 10.1080/16843703.2010.11673217
Metrics:

See at: Quality Technology & Quantitative Management Restricted | www.tandfonline.com | CNR ExploRA

2008 Conference article Open Access

Discrimination-aware data mining
Pedreschi D., Ruggieri S., Turini F.
In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Rules extracted from databases by data mining techniques, such as classification or association rules, when used for decision tasks such as benefit or credit approval, can be discriminatory in the above sense. In this paper, the notion of discriminatory classification rules is introduced and studied. Providing a guarantee of non-discrimination is shown to be a non trivial task. A naive approach, like taking away all discriminatory attributes, is shown to be not enough when other background knowledge is available. Our approach leads to a precise formulation of the redlining problem along with a formal result relating discriminatory rules with apparently safe ones by means of background knowledge. An empirical assessment of the results on the German credit dataset is also providedSource: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 560–568, Las Vegas, Nevada, USA, August 24-27, 2008
DOI: 10.1145/1401890.1401959
Metrics:

See at: www.di.unipi.it Open Access | dl.acm.org Restricted | doi.org | CNR ExploRA

2008 Journal article Restricted

An Application of Advanced Spatio-Temporal Formalisms to Behavioural Ecology
A. Raffaetà, T. Ceccarelli, D. Centeno, F. Giannotti, A. Massolo, C. Parent, C. Renso, S. Spaccapietra, F. Turini
There is great potential for the development of many new applications using data on mobile objects and mobile regions. To promote these kinds of appli- cations advanced data management techniques for the representation and analysis of mobility-related data are needed. Together with application experts (behavioural ecologists), we investigate how two novel data management approaches may help. We focus on a case study concerning the analysis of fauna behaviour, in particular crested porcupines, which represents a typical example of mobile object monitoring. The first technique we experiment with is a recently developed conceptual spatio- temporal data modelling approach, MADS. This is used to model the schema of the database suited to our case study. Relying on this first outcome a subset of the problem is represented in the logical language MuTACLP. This allows us to formalise and solve the queries which enable the behavioural ecologists to derive crested porcupines behaviour from the raw data on animal movements. Finally, we investigate the support from a commercial Geographic Information System (GIS)Source: Geoinformatica an international journal 12 (2008): 37–72. doi:10.1007/s10707-006-0016-6
DOI: 10.1007/s10707-006-0016-6
Metrics:

See at: dl.acm.org Restricted | GeoInformatica | Infoscience - EPFL scientific publications | CNR ExploRA

2007 Contribution to book Unknown

Knowledge Discovery from Geographical Data
Rinzivillo S., Turini F., Bogorny V., Komer C., Kuijpers B., May M.
During the last decade, data miners became aware of geographical data. Today, knowledge discovery from geographic data is still an open research field but promises to be a solid starting point for developing solutions for mining spatiotemporal patterns in a knowledge-rich territory. As many concepts of geographic feature extraction and data mining are not commonly known within the data mining community, but need to be understood before advancing to spatiotemporal data mining, this chapter provides an introduction to basic concepts of knowledge discovery from geographical data.Source: Mobility, Data Mining and Privacy: Geographic Knowledge Discovery, pp. 243–266. Berlin: Springer-Verlag, 2007

See at: CNR ExploRA

2007 Contribution to book Unknown

Privacy Protection: Regulations and Technologies, Opportunities, and Threats
Bonchi F., Pedreschi D., Turini F., Atzori M., Malin B., Moelans B., Saygin Y., Verykios V.
Information and communication technologies (ICTs) touch many aspects of our lives. The integration of ICTs is enhanced by the advent of mobile, wireless, and ubiquitous technologies. ICTs are increasingly embedded in common services, such as mobile and wireless communication, Internet browsing, credit card e-transactions, and electronic health records. As ICT-based technologies become ubiquitous, our everyday actions leave behind increasingly detailed digital traces in the information systems of ICT-based service providers. For example, consumers of mobile-phone technologies leave behind traces of geographic position to cellular provider records, Internet users leave behind traces of the Web page and packet requests of their computers in the access logs of domain and network administrators, and credit card transactions reveal the locations and times where purchases were completed. Traces are an artifact of the design of services, such that their collection and storage are difficult to avoid. To dispatch calls, for instance, the current design of wireless networks requires knowledge of each mobile user's geographic position. Analogously, DNS servers for the Internet need to know IP addresses to dispatch requests from source to destination computers.Source: Mobility, Data Mining and Privacy: Geographic Knowledge Discovery, edited by F. Giannotti, D. Pedreschi, pp. 101–122. Berlin: Springer-Verlag, 2007

See at: CNR ExploRA